Polyketides and their synthesis

ABSTRACT

Biosyntheses of compounds whereof at least portions are polyketides produced by means of polyketide synthase (PKS) enzyme complexes are carried out after specific alterations have been made within the acyltransferase (AT) domains of the PKSs. Particular motifs in or near the substrate binding pocket are disclosed, such that alterations therein affect substrate specificity.

This application is a 371 of PCT/GB01/03642, filed Aug. 14, 2001, which claims the benefit of foreign priority of United Kingdom 00119986.9, filed Aug. 14, 2000.

TECHNICAL FIELD

The present invention relates to processes and materials (including enzyme systems, nucleic acids, vectors and cultures) which can be used to influence the selection of acylthioester units for the synthesis of polyketides, and to the resulting polyketides, which may be novel. It is particularly concerned with macrolides, polyethers or polyenes and their preparation making use of recombinant synthesis.

In preferred types of embodiment, polyketide biosynthetic genes or portions of them, which may be derived from different polyketide biosynthetic gene clusters, are manipulated to allow the production of specific polyketides, such as 12-, 14- and 16-membered macrolides, of predicted structure. The invention is particularly concerned with the modification of an Acyl CoA:ACP transferase (AT) function, generally by modifying genetic material encoding it in order to prepare polyketides with a predetermined ketide unit, e.g. incorporating (a) a malonate extender unit; or (b) a methylmalonate extender unit; or (c) an ethylmalonate extender unit; or (d) a further type of extender unit; or (e) an acetate and/or malonate starter unit; or (f) a propionate and/or methylmalonate starter unit; or (g) a butyrate and/or ethylmalonate starter unit; or (h) a further type of starter unit. Of course the invention can be used to influence more than one ketide unit of a polyketide. The method enables one to minimise alteration to the protein structure of the polyketide synthase.

Polyketides are a large and structurally diverse class of natural products that includes many compounds possessing antibiotic or other pharmacological properties, such as erythromycin, tetracyclines, rapamycin, avermectin, monensin, epothilone and FK506. In particular, polyketides are abundantly produced by Streptomyces and related actinomycete bacteria. They are synthesised by the repeated stepwise condensation of acylthioesters in a manner analogous to that of fatty acid biosynthesis. The structural diversity found among natural polyketides arises in part from the selection of (usually) acetate (malonyl-CoA) or propionate (methylmalonyl-CoA) as “starter” or “extender” units (although one of a variety of other types of unit may occasionally be selected); as well as from the differing degree of processing of the β-keto group formed after each condensation. Examples of processing steps include reduction to β-hydroxyacyl-, reduction followed by dehydration to 2-enoyl-, and complete reduction to the saturated acylthioester. The stereochemical outcome of these processing steps is also specified for each cycle of chain extension. Methylation at the α-carbon or β-hydroxy is also sometimes observed.

The biosynthesis of polyketides is performed by a group of chain-forming enzymes known as polyketide synthases. Two broad classes of polyketide synthase (PKS) have been described in actinomycetes. One class, named Type I PKSs, represented by the PKSs for the macrolides erythromycin, oleandomycin, avermectin, and rapamycin and by the PKS for the polyether monensin, consists of a different set or “module” of enzymes for each cycle of polyketide chain extension. For an example see FIG. 1 (Cortés, J. et al. Nature (1990) 348:176-178; Donadio, S. et al. Science (1991) 2523:675-679; Swan, D. G. et al. Mol. Gen. Genet. (1994) 242:358-362; MacNeil, D. J. et al. Gene (1992) 115:119-125; Schwecke, T. et al. Proc. Natl. Acad. Sci. USA (1995) 92:7839-7843; also Patent application WO98/01546). The genes encoding numerous Type I PKSs have been sequenced and these sequences disclosed in publicly available DNA and protein sequence databases including Genbank, Swissprot and EMBL. For example, the sequences are available for the PKSs governing the synthesis of erythromycin (Cortes, J. et al. Nature (1990) 348:176-178); accession number X62569, Donadio, S. et al. Science (1991) 252:675-679; accession number M63677); rapamycin (Schwecke, T. et al. Proc. Natl. Acad. Sci. (1995) 92:7839-7843; accession number X86780); rifamycin (August, P. et al. Chem. Biol. (1998) 5:69-79; accession number AF040570) and tylosin (Eli Lilly, accession number U78289), among many others.

The term “polyketide synthase” (PKS) as used herein refers to a complex of enzyme activities responsible for the biosynthesis of polyketides. These enzyme activities include β-ketoacyl ACP synthase (KS), acyltransferase (AT), acyl carrier protein (ACP), β-ketoreductase (KR), dehydratase (DH), enoylreductase (ER) and thioesterase (TE) but are not limited to these activities. Each of these activities lies on a separate protein or polypeptide fragment responsible for this activity. Such a fragment is termed a “domain”. The terms “motif” or “signature sequence” used herein refer to a small stretch of amino acids (usually less than 10 amino acids) within a domain responsible (at least in part) for one aspect of the catalytic function, for example, choice of substrate. The term “extension module” as used herein refers to the set of contiguous domains, from a β-ketoacyl-ACP synthase (“KS”) domain to the next acyl carrier protein (“ACP”) domain, which accomplishes one cycle of polyketide chain extension; this may or may not include domains responsible for the reductive processing of the polyketide chain. The term “loading module” is used to refer to any group of contiguous domains that accomplishes the loading of the starter unit onto the PKS and thus renders it available to the KS domain of a specific extension module.

BACKGROUND ART

Several approaches to altering the nature of the polyketide product of a PKS by genetic engineering have been proposed: see particularly WO 93/13663 and WO 98/01571. The length of polyketide formed has been altered, in the case of erythromycin biosynthesis, by specific relocation using genetic engineering of the enzymatic domain of the erythromycin-producing PKS that contains the chain-releasing thioesterase/cyclase activity (Cortés, J. et al. Science (1995) 268:1487-1489; Kao, C. M. et al. J. Am. Chem. Soc. (1995) 117:9105-9106).

In-frame deletion of the DNA encoding part of the ketoreductase domain in module 5 of the erythromycin-producing PKS (also known as 6-deoxyerythronolide B synthase, DEBS) has been shown to lead to the formation of erythromycin analogues 5,6-dideoxy-3-α-mycarosyl-5-oxoerythronolide B, 5,6-dideoxy-5-oxoerythronolide B and 5,6-dideoxy, 6 β-epoxy-5-oxoerythronolide B (Donadio, S. et al. Science (1991) 252:675-679). Likewise, alteration of active site residues in the enoylreductase domain of module 4 in DEBS, by genetic engineering of the corresponding PKS-encoding DNA and its introduction into Saccharopolyspora erythraea, led to the production of 6,7-anhydroerythromycin C (Donadio, S. et al. Proc Natl. Acad. Sci. USA (1993) 90:7119-7123).

Patent application WO 00/01827 describes further methods of manipulating a PKS to change the oxidation state of the β-carbon. Substituting the reductive domain of module 2 of the erythromycin-producing PKS with domains derived from rapamycin PKS modules 10 and 13 led to the formation of C10-C11 olefin-erythromycin A and C10-C11 dihydroerythromycin A respectively.

The second class of PKS, named Type II PKSs, is represented by the synthases for aromatic compounds. Type II PKSs contain only a single set of enzymatic activities for chain extension and these are re-used as appropriate in successive cycles (Bibb, M. J. et al. EMBO J. (1989) 8:2727-2736; Sherman, D. H. et al. EMBO J. (1989) 8:2717-2725; Fernandez-Moreno, M. A. et al. J. Biol. Chem. (1992) 267:19278-19290). The “extender” units for the Type II PKSs are usually acetate (malonyl-CoA) units, and the presence of specific cyclases dictates the preferred pathway for cyclisation of the completed chain into an aromatic product (Hutchinson, C. R. and Fujii, I. Annu. Rev. Microbiol. (1995) 49:201-238). Hybrid polyketides have been obtained by the introduction of cloned Type II PKS gene-containing DNA into another strain containing a different Type II PKS gene cluster, for example by introduction of DNA derived from the gene cluster for actinorhodin, a blue-pigmented polyketide from Streptomyces coelicolor, into an anthraquinone polyketide-producing strain of Streptomyces galileus (Bartel, P. L. et al. J. Bacteriol. (1990) 172:4816-4826). Occasionally, unusual starter units are incorporated by Type II PKS, particularly in the biosynthesis of oxytetracycline, frenolicin and daunorubicin and in these cases a separate AT is used to transfer the starter unit to the PKS.

Fungal PKSs such as the 6-methylsalicylic acid or lovastatin PKS typically consist of a single multi-domain polypeptide which include most of the activities required for the synthesis of the polyketide portion of these molecules (Hutchinson C. R. and Fujii I. Annu. Rev. Microbiol. (1995) 49:201-238). Type II Fungal PKSs are also known.

A number of mixed systems comprising polyketide synthase and nonribosomal peptide synthase modules have been identified including the epothilone and bleomycin biosynthetic clusters.

Although large numbers of therapeutically important polyketides have been identified, there remains a need to obtain novel polyketides that have enhanced properties or possess completely novel bioactivity. The complex polyketides produced by Type I PKSs are particularly valuable, in that they include compounds with known utility as anthelminthics, insecticides, anticancer, immunosuppressants, antifungal or antibacterial agents. Because of their structural complexity, such novel polyketides are not readily obtainable by total chemical synthesis, or by chemical modifications of known polyketides. Particular changes that are desired are changes to the carbon skeleton by altering the nature of the starter and/or extender unit(s) incorporated, changes to the oxidation level of the β-keto carbon and therefore the pattern of oxygen substituents by altering the series of reductive steps that occur after chain extension and changes to the post PKS “tailoring” steps which generally comprise hydroxylation, methylation or glycosylation of the polyketide molecule.

There is also a need to develop reliable and specific ways of deploying individual modules in practice so that all, or a large fraction, of hybrid PKS genes that are constructed, are viable and produce the desired polyketide product. Various strategies have been described to produce these hybrid PKSs particularly utilising recombinant DNA technology and denovo biosynthesis. There is a particular need to develop methods of manipulating these PKS in a manner that minimises the alteration to the PKS protein structure. Existing methods of achieving these manipulations sometimes produce hybrid PKS multienzymes which give the desired product at only 1% or less of the rate that the unmodified PKS produces product.

WO 93/13663 and WO 98/01571 describe novel methods of engineering PKSs. A well-established method of altering the nature of the extender unit used at any position in the polyketide molecule, particularly malonyl-, methylmalonyl- or ethylmalonyl-CoA is by domain substitution. For example, WO98/01546 and U.S. Pat. No. 6,063,561 disclose methods of accomplishing this modification to form modified erythromycins. Novel polyketide molecules, in this case particularly novel erythromycins, are produced by the replacement of an entire AT domain-encoding DNA fragment on the Saccharopolyspora erythraea chromosome with an equivalent heterologous AT domain-encoding fragment from another PKS cluster. It is well known to those skilled in the art that selection of the exact DNA/protein splice sites into which to insert the heterologous domain requires detailed analysis of the corresponding DNA and protein sequences. Different researchers choose to use splice sites at conserved, semi-conserved or non-conserved regions of the protein, or at sites either within or at the boundaries of the AT domains. A further drawback of this technique is that it is hard to predict whether a particular heterologous domain will work in any given context. A domain that works successfully in one module may not work at all in an adjoining module or may produce polyketides at a vastly reduced yield. Oliynyk, M. et al. (Chem. Biol. (1996) 3:833-839) and Ruan et al. (J. Bact. (1997) 179:6416-6425) have published studies that exchange a methylmalonyl-CoA specific AT domain for malonyl-CoA specific AT domains in modules of the erythromycin PKS. Products were observed only for changes in modules 1 and 2, with module 2 at a vastly lowered yield. Stassi et al. (Proc. Natl. Acad. Sci. (1998) 95:7305-9) exchange the methylmalonyl-CoA specific AT of module 4 of the erythromycin PKS for an ethylmalonyl-CoA specific AT and again product yield was low even after the addition of the crotonyl-CoA reductase gene thought to increase the supply of the required ethylmalonyl-CoA precursor. A possible reason for the limiting yields is the structural or mechanistic non-compatibility of a heterologous AT domain with the adjoining KS and ACP domains with which it must interact properly for efficient polyketide chain synthesis. Consequently, it is often necessary to try multiple domain swaps to achieve a novel polyketide-producing strain that displays adequate efficiency—a process made particularly arduous when these changes must be made by gene replacement on the chromosome through a two step double integration process. The introduction of splice sites at the DNA level is time consuming and technically challenging, requiring careful analysis to ensure the PKS protein coding reading frame is not disrupted. The introduction of restriction enzyme sites often requires changes at the amino acid level which lead to further PKS protein structure disruption and consequent loss of catalytic efficiency.

A method that could utilise the numerous techniques available for site directed mutagenesis to influence the AT substrate specificity with minimal disruption to the protein tertiary structure would be a valuable addition to the current techniques.

Changes to an active site have been shown to alter substrate specificity in other systems. For example, in an early study, Scrutton et al. (Nature (1990) 343:38-43) used site directed mutagenesis to switch the coenzyme substrate specificity of a glutathione reductase. Identifying and changing a ‘fingerprint’ structural motif in the NADP+ binding domain they could convert the enzyme into one displaying a marked preference for NAD+. The techniques of directed evolution have been used to improve/change enzyme catalytic function. Of many examples in the literature, Zhang et al. (PNAS (1997) 94:4504-4509) illustrate the conversion of a galactosidase to a fucosidase by these techniques. The resulting protein bears 6 mutations, of which 3 lie in, or in close proximity to the active site.

Minor but directed changes to a PKS domain can make significant changes to its catalytic function. Patent application WO 00/00500 teaches that an extender ketosynthase domain is converted to a decarboxylating (and hence loading) ketosynthase domain by site directed mutagenesis at the active site. U.S. Pat. Nos. 6,004,787 and 6,066,721 and Jacobsen et al. Science (1997)277:367-369 describe the deletion of residues in the KS1 active site to inactivate this activity to allow the production of novel polyketides by feeding of synthetic precursors to the modified PKS.

Several studies have attempted to correlate the primary amino acid sequence of the AT to determine amino acids directly involved with the recognition of the appropriate substrate, and particularly the nature of the substrate side chain (i.e. the malonyl portion of the acyl-CoA thioester). Studies by Haydock et al. (FEBS Lett. (1995) 374:246-248) correlated the substrate specificity of malonyl- or methylmalonyl-CoA specific AT with a motif 11 amino acids upstream of the known active site. Comparisons between this motif and the protein structure of a known acyltransferase from E. coli fatty acid synthase allowed the authors to assess the proximity of the motif residues to the active site (and hence its ability to select the substrate). The authors acknowledged that “this divergent region thus identified lies near the acyltransferase active site though not close enough to make direct contact with the substrate”. Other studies (Katz, L. Chem Rev. (1997) 97:2557-2575, Tang, L. et al., Gene (1998) 216:255-265) have correlated additional residues with a specific extender unit using these residues as a tool to predict the AT substrate specificity from a protein sequence derived from polyketide gene cluster sequencing projects. It has remained unclear which residues have mechanistic importance. In only one case have regions within the PKS AT domain been exchanged in an attempt to swap AT specificity; patent application WO 00/01838 and Lau et al. Biochemistry (1999) 38:1643-51) implicated a ‘hypervariable region’ at the C-terminus of the AT domain in the selection of extender unit. These workers interchanged this 25-30 amino acid stretch and showed that this change was sufficient to alter the substrate specificity of the AT, concluding “a short (23-35 amino acid) C-terminal segment present in all AT domains is the principal determinant of their substrate specificity. Interestingly its length and amino acid sequence vary considerably among the known AT domains. We therefore suggest that the choice of extender units by the PKS modules is influenced by a “hypervariable region”, which could be manipulated via combinatorial mutagenesis to generate novel AT domains possessing relaxed or altered substrate specificity”. Surprisingly, our structure molecular modelling studies indicate this region lies at a surface accessible region away from the active site and hence is unlikely to directly interact with (and hence directly select) the malonyl portion or the substrate used. The effect on substrate specificity is therefore likely to be imprecise and due to more indirect effects via, for example, disruption of tertiary structure.

DISCLOSURE OF INVENTION

According to a first aspect of the present invention there is provided a method of synthesising a compound whereof at least a portion is the product of a polyketide synthase (PKS) enzyme complex or is derived from such a product, said PKS enzyme complex including at least one acyltransferase (AT) domain. The method includes a step of providing said PKS enzyme complex in which said AT domain has been altered to change selectively a minor proportion of amino acid residues. The altered residue(s) may comprise one or more motifs which are present in the active site pocket of the AT domain and which influence the substrate specificity of the AT domain, the alteration affecting the substrate specificity; and/or one or more residues of a motif which influences the substrate specificity of the AT domain and which comprises a four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS, the alteration affecting the substrate specificity. Synthesis is then effected by means of said PKS enzyme complex to produce a compound or mixture of compounds different from what could have been produced by means of a PKS enzyme in which said AT domain had not been altered.

The PKS enzyme complex may be at least part of a modular type I PKS enzyme complex, or it may be derived from a type II PKS system, a fungal PKS system or a hybrid system comprising PKS and nonribosomal peptide synthase modules.

The present invention teaches that by altering a few amino acid residues in the AT domain and particularly residues close to the AT active site comprising one or more residues of a short signature “motif” within the AT domain it is possible to influence the acylthioester selected by that AT domain. Novel polyketides can be made by a modified PKS on which the signature motif on one or more modules is altered, e.g. being replaced with one associated with a different specificity for malonyl substrate. Furthermore, the invention provides a method of reducing the proportion of mixed polyketide products that are occasionally found in natural systems due to non-specific incorporation of the incorrect extender units. Conversely, the invention provides a method of giving a mixed population of polyketide products thus increasing the diversity of polyketides produced by a PKS.

The invention allows the preparation of a modified PKS by substitution of an existing amino acid residue motif in the AT that specifies incorporation of one of the common extender acylthioesters with another motif found in another AT specifying an alternative acylthioester. This alters the substrate specificity of the polyketide synthase when it is expressed in a polyketide-producing organism.

The DNA sequences have been disclosed for numerous Type I PKS gene clusters. Comprehensive sequence analysis of AT domains derived from Type I PKS modules responsible for the formation of macrolides, particularly erythromycin, rapamycin, avermectin, rifamycin, FK506, epothilone, tylosin, and niddamycin, ionophore polyethers, particularly monensin, and polyenes, particularly nystatin, allowed us to identify amino acids that are characteristic of AT domains.

FIG. 2 shows the sequence comparison of these AT domains. This sequence comparison has been generated in a generally conventional way, employing a computer using a procedure that creates a multiple sequence alignment from a group of related sequences. We used a program called PileUp (Wisconsin Package, Genetics Computer Group (GCG), Madison, Wis., USA), which creates a multiple sequence alignment using simplification of the progressive alignment method of Feng and Doolittle (journal of Molecular Evolution 25; 351-360 (1987)). The method used is similar to the method described by Higgins and Sharp (CABIOS 5; 151-153 (1989)). The program executes a series of progressive, pairwise alignments that allows a large number of sequences to be compared together to form a final alignment throughout all the sequences. Gaps can be inserted throughout individual sequences to allow alignment of regions of strong similarity. This is often required as strongly conserved regions are often separated by more variable regions, both in terms of numbers of amino acids and type of amino acids. Different programs use different mathematical algorithms to make these comparisons, resulting in alignments that differ in minor ways. However, it can be expected that regions of strong homology would still align whatever alignment program is utilised. The particular motifs that are discussed are marked.

These motifs include the conserved GQG motif that is close to the start of the domain, the GHS motif that contains the active site serine that covalently binds the acyl chain prior to transfer to the ACP, and a LPTY (SEQ ID NO:115) motif that is close to the end of the domain. Other residues common to all ATs including an arginine, believed to stabilise the carboxylate group of the acylthioester. Further detailed sequence analysis allowed us to identify amino acid residues that differed between ATs responsible for the incorporation of malonyl-, methylmalonyl- and ethylmalonyl-CoA. Some of these amino acids or motifs had been previously identified during the sequence analysis of the clusters as previously discussed. While these motifs could predict whether a malonyl-/methylmalonyl-CoA might be used they generally fail to show a difference between methylmalonyl- vs ethylmalonyl-CoA or the other larger extender unit commonly used. We viewed this as an important requirement for identification of the most important and key residues involved in substrate recognition and consequently residues most suitable for alteration. Closer analysis identified a string of four residues (location identified clearly in FIG. 2) of which two residues are virtually invariant throughout all ATs, and two residues differ consistently depending on the extender unit. Particularly, in the vast majority of ATs responsible for recognition of malonyl-CoA the sequence of residues HAFH (SEQ ID NO:117) was identified. In the majority of ATs responsible for recognition of methylmalonyl-CoA the equivalent segment was substituted by residues YASH (SEQ ID NO:114). In ATs responsible for ethylmalonyl-CoA or other similar sized CoA unit incorporation the overall motif was different, less conserved but generally displayed the sequence XAGH (where X is most frequently but not limited to F, T, V or H; SEQ ID NO:116) We typically use the terms HAFH (SEQ ID NO:117), YASH (SEQ ID NO:114) and TAGH (SEQ ID NO:118) to describe these motifs with respect to malonyl-CoA, methylmalonyl-CoA and ethylmalonyl/further CoA specificity but use these terms herein to allow substitutions in the motif, particularly at residue 1 as described. Potential substitutions and the exact location of the motif will be clear to those skilled in the art by inspection of FIG. 2 or similar sequence analysis.

There are three possible methods to locate the position of the motif within an AT sequence that does not appear in FIG. 2. It is likely a combination of the methods will be used.

-   -   I) By simple visual inspection and comparison of the sequence to         identify the motifs HAFH, YASH or TAGH. Since substitutions of         residue one are often encountered a useful procedure is to look         for an alanine (A) separated by one amino acid (usually F, S         or G) from a histidine (H).     -   II) By counting amino acids from the active site serine. The         start of the motif is typically (but should not be limited to)         between 90 and 100 amino acids downstream of the GHS active site         motif.     -   III) By computer generated multiple alignment that allows the         new sequence to be directly compared to the sequences and motifs         we have annotated in FIG. 2 or to other ATs.

It is preferable to use the third method as this allows the motif to be identified unequivocally when there are substitutions within the motif. This is particularly necessary in some of the more unusual types of AT in which one of the residues can be substituted by proline (P). The third method will also identify the motif when the number of residues between the motif and the AT active site serine differs significantly from the norm. The third method will also better identify the motif when the same or similar string of amino acids occurs elsewhere in the domain.

A particular feature of these motif residues is the relationship of the size of the third residue compared to the substrate selected. Hence, when malonyl-CoA is required the third residue is large (phenylalanine), when methylmalonyl-CoA is required this residue is intermediate (serine), and when ethylmalonyl-CoA is required this residue is small (glycine). The inverse relationship between substrate side chain size and this third residue is particularly noteworthy. Interestingly, this relationship applies also when considering the incorporation of the more unusual extender units such as methoxymalonyl-CoA, required for some cycles of chain extension during production of for example FK506 (HAGH; SEQ ID NO:119). Currently, only a single example of an AT responsible for the incorporation of a five carbon-CoA unit has been disclosed. In this case the AT displays a different motif at this point, CPTH (SEQ ID NO:120), in which only the histidine is conserved. The incorporation of a proline residue in the motif may be indicative of an AT specifying a larger substrate. Proline is also found in the motif in ATs that incorporate the larger unusual starter acids as seen in the case of avermectin and soraphen. Residues in and around this area, but lying in the active site of the AT domain define the specificity of the domain towards the substrate chosen.

Motifs that represent hybrids of motifs for malonyl- and methylmalonyl-CoA or methylmalonyl- and ethylmalonyl-CoA were identified. Particularly, epothilone module 3-expected HAFH (SEQ ID NO:117) or YASH (SEQ ID NO:114) (malonyl-CoA or methylmalonyl-CoA specific), found HASH (SEQ ID NO:121) or monensin module 5-expected TAGH (SEQ ID NO:118) (ethylmalonyl-CoA specific), found VAGH (SEQ ID NO:122). Significantly, in both these cases the products of the PKS are a mixture due to the incorporation of 2 different extender units by the module containing the hybrid motif, causing formation of monensins A and B and epothilones A and B. However, it is known that substrate supply is a significant determinant of the proportion of monensins A and B formed (Liu, H. and Reynolds, K. A (1999) J. Bact. 181:6806-6813).

Many of the previously-proposed “predictive” motifs are unlikely to be the principal determinant of substrate specificity because they are not located in the active site pocket. A particular requirement of any motif that can serve to distinguish between substrates is that it lies close to the active site and preferably within the substrate binding pocket. In this analysis we consider the substrate binding pocket to be the part of the pocket that binds/recognises the malonyl portion of the acylthioester rather than necessarily the coenzyme A portion. In all probability some of the similarities previously identified by sequence analysis are due to evolutionary conservation rather than a mechanistic requirement. In contrast the residues we have identified lie in or close to the substrate binding pocket. To assess the exact location of the motif in space we compared the protein sequence of ATs derived from Type I PKS with that of E. coli fatty acid malonyl-CoA:ACP acyltransferase, for which there is a high resolution X-ray crystal structure (Serre, L. et al., J. Biol. Chem. (1995) 270:12961-12964). While overall level of sequence similarity between these proteins is low, key residues (and particularly those with mechanistic importance) are conserved and the overall spatial arrangement of amino acids is expected to be conserved. Many groups have used this structure as a model AT and it is well known in the art that conservation of structure can be greater than the level of sequence conservation. Structural analysis showed that the identified motif would lie within the active site pocket opposite the active site serine and the arginine thought to be involved in binding the substrate carboxylate and close enough to the acyltransferase site to interact with the bound substrate side chain. The invariant histidine found in the motif is thought be part of a catalytic triad with the active site serine as is typically found in serine hydrolases (Serre et al, Supra). FIG. 3 shows the position of the motif loop and important active site residues in the model AT structure.

Broadly the invention concerns modifying an AT domain by changing the four-residue sequence or motif responsible for selecting a substrate so that its specificity is altered. We may also change a small number of other residues close to the active site. Generally the total number of residues changed is less than 5% of the residues of the AT.

The motif is the four-residue sequence corresponding to the YASH (SEQ ID NO:114) motif found at about residues 334-337 of the AT domain of the first module of DEBS, numbering as shown in FIG. 2. It lies in the active site pocket. It typically starts 80-110, more particularly 90-100, amino acids downstream of the GHS active site motif.

In a preferred embodiment of this invention polyketides of desired structure are produced by the replacement of an existing AT motif on a PKS with an alternative one responsible for selection of an alternative extender or starter unit, or responsible for an altered degree of selectivity (in most cases, increased selectivity). This may be carried out in one or more of the modules encoding a PKS cluster. One type of embodiment is a PKS including two adjoining domains, which are “naturally” adjoining or otherwise coupled domains, wherein the first of them is an AT domain where the four-residue motif has been altered to change its specificity, the AT domain acting to transfer a substrate to the second domain.

In one class of embodiments, this invention provides a PKS multienzyme or part thereof, or nucleic acid (generally DNA) encoding it, said multienzyme or part comprising a loading module and a plurality of extension modules for the generation of a polyketide, preferably selected from, macrolides, polyethers, or polyenes, wherein the loading or extension modules or at least one thereof contain a modified AT domain adapted to load and transfer an optionally substituted malonyl-CoA residue to (preferably) the ACP. The AT domain is preferably modified to alter its substrate specificity. This AT domain may differ from one naturally found in this position in the module only by the modification of a few amino acids lying in the active site. This modification comprises the exchange of all or part of a motif particularly but not limited to HAFH with YASH or TAGH or vice versa. Optionally, alterations to amino acids outside this sequence, but preferably lying close to the AT active site, are made.

A second class of embodiments provides a method of synthesising polyketides having a desired extension unit at any point around the polyketide molecule by providing a PKS multienzyme incorporating one or more modified AT domains and particularly but not limited to an AT domain possessing the motif HAFH (SEQ ID NO:117) or YASH (SEQ ID NO:114) or TAGH (SEQ ID NO:118) where these motifs replace the existing natural motif. Optionally, alterations to amino acids outside this sequence, but preferably lying close to the AT active site, are made.

A third class of embodiments provides a method of synthesising polyketides having a desired starter unit by providing a PKS multienzyme incorporating a modified AT domain in the loading module and particularly (but not limited to) an AT domain possessing the motif HAFH (SEQ ID NO:117) or YASH (SEQ ID NO:114) or TAGH (SEQ ID NO:118) or a motif incorporating a proline residue where these motifs replace the existing natural motif. Optionally, alterations to amino acids outside this sequence, but preferably lying close to the AT active site, are made. Preferentially, this AT will follow a KSQ domain but other loading systems are known in the art (e.g. AT-ACP). Patent application WO 00/00500 describes some of the known loading systems. The modification of the loading module can be combined with similar modifications in other extension units.

A further class of embodiments provides a method of synthesising polyketides free of natural co-produced analogues and having a desired extender or loading unit by replacing an existing hybrid or alternative protein motif with the sequences HAFH (SEQ ID NO:117), YASH (SEQ ID NO:114) or TAGH (SEQ ID NO:118) . It is particularly useful to make this alteration in the epothilone or monensin PKS gene cluster.

In still further aspects this invention provides a method of synthesising a mixed population of polyketides by providing a PKS multienzyme incorporating an AT with a altered or hybrid motif, particularly, but not limited to HASH or VAGH. One particular utility of this method, though not limited to this utility, is the production of combinatorial libraries of compounds.

In a further aspect the PKS containing a modified AT domain may be spliced to a hybrid PKS produced for example as in WO 98/01546 and WO 98/01571 or WO 00/01827 or WO 00/00500. It is particularly useful to link such a modified PKS to gene assemblies that produce novel derivatives of natural polyketides, for example 14-membered macrolides.

Each of these aspects and classes of embodiment may involve providing nucleic acid encoding the polyketide synthase multienzyme and introducing it into a organism where it can be expressed. Suitable plasmids and host cells are described below. The polyketide synthase so produced or portions thereof may be isolated from the host cells by routine methods, though it is usually preferable not to do so. The host cells may also be capable of producing the required acylthioester, eg. by producing ethylmalonyl CoA for example. It may be advantageous to remove the PKS from a strain with a particularly strong supply of an undesired acylthioester or express the altered PKS in a strain specifically chosen to have a strong supply of a particular acylthioester, or alternatively to develop media or growth conditions to enhance expression of the desired product. Conversely, such techniques could be used to promote formation of mixtures of products if so desired. It may also be beneficial to supply chemical precursors to the desired acylthioesters in the media e.g. supply diethylethylmalonate or cyclobutane carboxylic acid etc. The host cells may also be capable of modifying the initial PKS products, e.g. by carrying out all or some of the biosynthetic modifications normal in the production of erythromycin (as shown in FIG. 4) and for other polyketides. Use may be made of mutant organisms such that some or all of the normal pathways are blocked, e.g. to produce products without one or more “natural” hydroxy groups or methyl groups or sugar groups.

The invention should not be limited to the exact motifs described. We have described some of the known variations within the motif, particularly at residue 1 as can be determined by inspection of FIG. 2 or by inspection of similar sequence data. However other modifications can be envisaged; substitution of, for example, the phenylalanine in the malonyl-CoA motif by the similar sized tyrosine may still display the same selectivity. Similarly substitution of the small residue glycine found in the motif responsible for ethylmalonyl-CoA/other extender incorporation by for example but not limited to alanine. It is well known to those skilled in the art that these and other similar conservative substitutions frequently maintain the same selectivity. Similarly the serine residue found in the motif for incorporation of methylmalonyl-CoA could be substituted by a residue intermediate in size and/or displaying a similar charge distribution.

The invention should not be limited to changes only in this motif. Alterations to other residues around the AT domain may also be required to increase the level of specificity or catalytic efficiency, i.e. to increase the proportion or amounts of the desired products. These residues are preferentially close to the substrate binding pocket. The requirement for these additional alterations will depend on the particular context or change desired. Particular residues to alter can be readily identified by inspection of FIG. 2 or other similar sequence analysis data or alternatively by analysis of the structural model.

Residues that may be altered in addition to the motif can be divided into two classes. Some of these residues may have been previously identified in the motifs used to predict the specificity of a motif (ie. Haydock et al. (FEBS Lett. (1995) 374:246-248). These residues are preferentially close to the substrate-binding pocket. These residues should not be limited to the particular examples described.

I) The first class of potential residues to change includes residues close to the motif on the polypeptide chain. A particular example is the residue immediately after the 4 residue motif described in the present invention. In malonyl-CoA specific ATs this residue is generally serine (S), i.e. the protein sequence at this point is generally HAFHS (SEQ ID NO:123), whereas in methylmalonyl-CoA specific ATs this residue can be S but can also be T, G, or C for example. Thus to change a methylmalonyl-CoA specific AT to a malonyl-CoA specific AT by changing the signature motif it may be beneficial also to ensure that the residue immediately after the motif is an S. Since this residue is close to the motif on the polypeptide chain it lies close to the substrate binding pocket.

II) The second class includes residues that are close to the motif or active site in space. These residues are best identified by reference to the model AT structure described previously or another AT structure that may be subsequently derived. It is known to those skilled in the art that it is possible to thread related protein sequences into an existing structure by using structure molecular modelling or related techniques. Alternatively, an acylthioester may be modelled into the active site. These are the preferred methods, but often-simple inspection of the existing structure using the highly conserved motifs as a reference point gives a reasonable approximation.

A particular example of a residue close in space to the motif that might be changed is the residue immediately after the GHS active site motif. In methylmalonyl-CoA specific ATs this residue is generally glutamine (Q), i.e. the protein sequence at this point is GHSQ (SEQ ID NO:124), whereas in malonyl-CoA specific ATs this residue is often V, I or L for example. Thus to change a malonyl-CoA specific AT to a methylmalonyl-CoA specific AT by changing the signature motif it may be beneficial also to ensure that the residue immediately after the GHS motif is a Q. Since this residue is close to the active site serine it lies within the substrate-binding pocket.

A further example of a residue close in space that might be altered is the residue lying three residues downstream of the GQG motif. In methylmalonyl-CoA specific ATs this residue is generally tryptophan (W), i.e. the protein sequence at this point is GQGXXW (SEQ ID NO:125), whereas in malonyl-CoA specific ATs this residue is often R, H or T for example. Thus to change a malonyl-CoA specific AT to a methylmalonyl-CoA specific AT by changing the signature motif it may be beneficial also to ensure that this particular residue after the GQG motif is a W. Analysis of the model AT structure shows that the GQG motif lies close to the active site pocket and consequently so does this tryptophan.

A further example of a residue close in space that might be altered is the residue 4 residues downstream from the conserved arginine referred to above, which is believed to stabilise the carboxylate group of the acylthioester substrate. In malonyl-CoA specific ATs this residue downstream of the R is generally methionine (M), i.e. the protein sequence at this point is RXXXMQ. In methylmalonyl-CoA specific ATs this residue is generally I or L, and in ethylmalonyl-CoA specific ATs it is often W. Thus, for example, to change a methylmalonyl-CoA specific AT to a malonyl-CoA specific AT by changing the signature motif it may be beneficial also to ensure that this particular residue is a methionine. Analysis of the model AT structure shows that this residue lies close to the active site pocket.

In further aspects the present invention provides vectors, such as plasmids or phages (preferably plasmids), including nucleic acids as defined in the above aspects and host cells particularly Saccharopolyspora or Streptomyces species transformed with such nucleic acids or constructs. It will be readily apparent to those skilled in the art that there are multiple molecular biological methods for achieving the desired alterations to the AT domain, particularly at the nucleic acid level, e.g. techniques of site directed mutagenesis or directed evolution. Suitable plasmid vectors and genetically engineered cells suitable for expression of PKS genes with modules incorporating an altered AT domain can readily be designed or selected by those skilled in the art. They include those described in WO 98/01546 as being suitable for expression of hybrid PKS genes of Type I. Examples of effective hosts are Saccharopolyspora erythraea, Streptomyces coelicolor, Streptomyces avermitilis, Streptomyces griseofuscus, Streptomyces cinnamonensis, Streptomyces fradiae, Streptomyces longisporoflavus, Streptomyces hygroscopicus, Micromonospora griseorubida, Streptomyces lasaliensis, Streptomyces venezuelae, Streptomyces antibioticus, Streptomyces lividans, Streptomyces rimosus, Streptomyces albus, Amycolatopsis mediterranei, and Streptomyces tsukubaensis. These include hosts in which SCP2*-derived plasmids are known to replicate autonomously, such as for example S. coelicolor, S. avermitilis and S. griseofuscus; and other hosts such as Saccharopolyspora erythraea in which SCP2*-derived plasmids become integrated into the chromosome through homologous recombination between sequences on the plasmid insert and on the chromosome; and all such vectors which are integratively transformed by suicide plasmid vectors. A plasmid with an int sequence will integrate into a specific attachment site on the host's chromosome.

It is apparent to those skilled in the art that the overall sequence similarity between nucleic acids encoding comparable AT domains from Type I PKSs is sufficiently high and the domain organisation of different Type I PKSs so consistent between different polyketide-producing organisms, that the processes for obtaining novel hybrid polyketides described will be generally applicable to all natural modular Type I PKSs or their derivatives.

The present invention will now be illustrated, but is not intended to be limited, by means of some examples.

Amino acids have been defined throughout by their standard one letter codes as follows. A—alanine, R—arginine, N—asparagine, D—aspartic acid, C—cysteine, Q—glutamine, E—glutamic acid, G—glycine, H—histidine, I—isoleucine, L—leucine, K—lysine, M—methionine, F—phenylalanine, P—proline, S—serine, T—threonine, W—tryptophan, Y—tyrosine and V—valine.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram showing the functioning of 6-deoxyerythronolide B synthase (DEBS), a modular PKS producing 6-deoxyerythronolide B, a precursor of erythromycin A.

FIG. 2 a-v gives the amino acid sequence comparison of the AT domains of representative Type I PKS gene clusters (SEQ ID NOs: 1-112). The motifs GQG, GHS and LPTY (SEQ ID NO: 115) are marked at the base of the figure along with the arginine (at position 252 in the sequence alignment, which corresponds to residue 144 of SEQ ID NO:26) and the motif defined in the invention as defining specificity. The abbreviations used at the side to define the PKS used are: aye: avermectin, debs: 6-deoxyerythronolide B synthase or erythromycin, epo:epothilone, sor: soraphen, fkb: FK506, rap: rapamycin, tyl: tylosin, mon: monensin, nid:niddamycin, nys: nystatin, rif: rifamycin. The numbers represent the module number. The letter at the end of the designation indicates malonyl-CoA specific AT, the letter p indicates methylmalonyl-CoA specific AT, and the letter b indicates ethylmalonyl-CoA specific AT. Further types of AT with unusual or ill-defined AT specificity are indicated with letter x. Due to the numbers of sequences considered, in the pileup each section of 50 amino acids spreads over two pages. The sequences of the monensin ATs are unpublished. They are set out in PCT/GB00/02072.

FIG. 3 shows a three-dimensional representation of the active site of the E. coli acyltransferase. The spatial arrangement of the motifs described in the text are shown by arrows and the atoms shown in bold.

FIG. 4 shows the enzymatic steps that convert 6-deoxyerythronolide B into erythromycin A in Saccharopolyspora erythraea.

FIG. 5 shows the DNA sequence from the monensin PKS encoding the loading AT used in Example 8 (SEQ ID NO:113.

MODES FOR CARRYING OUT THE INVENTION Example 1

Construction of Plasmid pHP41

Plasmid pHP41 is a pCJR24-based plasmid containing the DEBS1 PKS gene comprising a loading module, the first and second extension modules of DEBS and the chain terminating thioesterase. The motif YASH (SEQ ID NO:114) of the AT domain of first module has been altered to HAFH (SEQ ID NO:117). Plasmid pHP41 was constructed by several intermediate plasmids as follows. Plasmid pD1AT2 (Oliynyk, M. et al. Chem. Biol. (1996) 3:833-839) was digested with NdeI and XbaI. A −11 kbp fragment was isolated by gel electrophoresis and the DNA purified from the gel. This fragment was ligated into pCJR24 (Rowe, C. J. et al. Gene (1998) 216:215-223) that had been linearised by digestion with NdeI and XbaI and treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B cells and individual clones checked for the desired plasmid pCJR26. Plasmid pCJR26 was identified by restriction pattern. pCJR26 was transformed into E. coli strain ET12567 (McNeil, D. J. et al. Gene (1992) 111:61-68) and an individual colony grown overnight to isolate demethylated DNA. This DNA was linearised using MscI and AvrII and the −13 kb fragment (Fragment A) isolated by gel electrophoresis and purification from the gel.

A DNA segment of the eryAI gene (start nucleotide 45368, end nucleotide 34734) from S. erythraea extending from nucleotide 42104 to nucleotide 41542 was amplified by PCR using the following oligonucleotide primers; 5′-TTTTTTTGGCCAGGGTTGGCAGTGGGCGGGCA-3′ (SEQ ID NO:127) and 5′-TTTTTACGGCCAGCCGCTTGGCGCGGAT-3′ (SEQ ID NO:128). The DNA from a plasmid designated pCJR65 derived from pCJR24 and DEBS1TE was used as a template. The design of the primers introduced a MscI site at nucleotide 42105 and the second primed across a BstXI site at position 41546. The 574 bp PCR product was treated with T4 polynucleotide kinase and ligated to plasmid pUC18 that had been linearised by digestion with SmaI and then treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B and individual clones checked for the presence of the desired plasmid pHP39. Plasmid pHP39 was identified by restriction pattern and sequence analysis. Demethylated DNA was produced by transforming E. coli strain ET12567 with plasmid DNA. The resulting DNA was linearised by digestion with MscI and BstXI and the resulting 552 bp fragment (Fragment B) isolated by gel electrophoresis and purified from the gel. A DNA segment of the eryAI gene from S. erythraea extending from nucleotide 41557 to nucleotide 41120 was amplified by PCR using the following oligonucleotide primers; 5′-CGGTGCCTAGGTGCACCGACTCCCAGTCC-3(SEQ ID NO:129) 5′-TTTTTCCAAGCGGCTGGCCGTGGACCACGCGTTCCACTCCTCGCACGTCGAGACGAT-3′ (SEQ ID NO:130) DNA from plasmid pCJR65 was used as a template. The design of the primers introduced an AvrII site at nucleotide 41125 and the second primed across a BstXI site at nucleotide 41557 and mutated the amino acid sequence YASH (SEQ ID NO:114) to HAFH (SEQ ID NO:117) (encoded by nucleotides 41537-41526). The 442 bp PCR product was treated with T4 polynucleotide kinase and ligated to plasmid pUC18 that had been linearised by digestion with SmaI and then treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B and individual clones checked for the presence of the desired plasmid pHP40. Plasmid pHP40 was identified by restriction pattern and sequence analysis. Plasmid pHP40 was linearised by digestion with restriction enzymes AvrII and BstXI, and a 427 bp fragment (Fragment C) isolated by gel electrophoresis and purified from the gel. Fragments A, B, and C were ligated together and the resulting ligation mixture used to transform electrocompetent E. coli DH10B. Individual clones were checked for the presence of an insert derived from DEBS1. The resulting plasmid was designated pHP41. Sequence analysis was used to confirm the clone contained the correct motif HAFH (SEQ ID NO:117).

Example 2

Construction of S. erythraea NRRL2338 JC2/pHP41 and Production of Triketides

S. erythraea NRRL2338 JC2 contains a deletion of the eryAI, eryAII and eryAIII apart from the TE (Rowe, C. J. et al. Gene 216, 215-223). Plasmid pHP41 was used to transform S. erythraea NRRL2338 JC2 protoplasts using the TE as a homology region. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. S. erythraea NRRL2338 JC2 (pHP41) was plated onto SM3 agar (see patent application WO 00/01827) containing 40 μg/ml thiostrepton and allowed to grow for 11 days at 30° C. Approximately 1 cm² of the agar was homogenised and extracted with a mixture of 1.2 ml ethyl acetate and 20 μl formic acid. The solvent was decanted and removed by evaporation and the residue dissolved in methanol and analysed by GC/MS. The major products were identified by comparison with authentic standards (Oliynyk, M. et al. Chem. Biol. (1996) 3:833-839) as triketide lactones (2S,3R,5R)-2-methyl-3,5-dihydroxy-n-hexanoic δ-lactone (AAP, i.e. Acetate, Acetate, Propionate incorporation), (2S,3R,5R)-2-methyl-3,5-dihydroxy-n-heptanoic δ-lactone (PAP), (2R,3S,4S,5R) 2, 4-dimethyl-3,5-dihydroxy-n-heptanoic δ-lactone (PPP) and (2R,3S,4S,5R) 2, 4-dimethyl-3,5-dihydroxy-n-hexanoic δ-lactone (APP). These products were identified as their ammonium adducts corresponding to exact mass 144, 158, 172 and 158. Four products were produced because in this strain, and under the conditions of the experiment the loading module loads both acetate and propionate and the modified AT loads malonyl-CoA and methylmalonyl-CoA. Only three triketide lactone peaks could be observed in the GC/MS spectra under standard conditions, this was due to the co-elution of the equivalent mass APP and PAP compounds. An isocratic gradient was used to verify this peak was comprised of two components. In further sets of experiments S. erythraea JC2 (pHP41) was used to inoculate 5 ml TSB containing 5 μg/ml thiostrepton. After three days growth 1.5 ml of this culture was used to inoculate 25 ml SM3 media containing 5 μg/ml thiostrepton in a 250 ml flask. The flask was incubated at 30° C., 250 rpm for 6 days. At this time the supernatant was adjusted to pH 3.0 with formic acid and extracted twice with an equal volume of ethyl acetate. The solvent was removed by evaporation and the residue analysed by GC/MS. In each experiment we could identify the 4 products AAP, PAP, PPP and APP but the absolute ratios and quantities were variable, presumably depending on exact media and growth conditions within each flask (FIG. 6).

Example 3

Construction of S. erythraea NRRL2338 (pHP41) and Its Use to Produce 12-desmethyl Erythromycin B

Plasmid pHP41 was used to transform S. erythraea NRRL2338 protoplasts. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. Several clones were tested for the presence of pHP41 integrated into the chromosome by Southern blot hybridisation of their genomic DNA with DIG labelled vector DNA. A clone with a correctly integrated copy of pHP41 was identified in this way. S. erythraea NRRL2338 (pHP41) was used to inoculate 5 ml TSB containing 5 μg/ml thiostrepton. After three days growth 1.5 ml of this culture was used to inoculate 25 ml EryP media (see patent application WO 00/00500) containing 5 μg/ml thiostrepton in a 250 ml flask. The flask was incubated at 30° C., 250 rpm for 6 days. At this time the supernatant was adjusted to pH 9.0 with ammonia and extracted twice with an equal volume of ethyl acetate. The solvent was removed by evaporation and the residue analysed by HPLC/MS. A peak of molecular mass m/z (M+H)=704 was observed required for C-12 desmethyl erythromycin B in addition to a peak corresponding to erythromycin A (M+H)=734. Other peaks corresponding to partially processed erythromycin intermediates could be identified.

Example 4

Construction of Plasmid pHP048

Plasmid pHP048 is a pCJR24-based plasmid containing the DEBS1 PKS gene comprising a loading module, the first and second extension modules of DEBS1 and the chain terminating thioesterase. The motif YASH of the AT domain of first module has been altered to HASH. Plasmid pHP048 was constructed by several intermediate plasmids as follows.

A DNA segment of the eryAI gene from S. erythraea extending from nucleotide 41557 to nucleotide 41120 was amplified by PCR using the following oligonucleotide primers; 5′-CGGTGCCTAGGTGCACCGACTCCCAGTCC-3′ (SEQ ID NO:129) and 5′-TTTTTCCAAGCGGCTGGCCGTGGACCACGCGTCGCACTCCTCGCACGTCGAGACGAT-3′ (SEQ ID NO:131). The DNA from plasmid pCJR65 was used a as template. The design of the primers introduced a AvrII site at nucleotide 41125 and the second extended to a BstXI site at nucleotide 41557, also mutated the amino acid sequence YASH (SEQ ID NO:114) (encoded by nucleotides 41537-41526) to HASH (SEQ ID NO:121). The 442 bp PCR product was treated with T4 polynucleotide kinase and ligated to plasmid pUC18 that had been linearised by digestion with SmaI and then treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B and individual clones checked for the presence of the desired plasmid pHP022. Plasmid pHP022 was identified by restriction pattern and sequence analysis. Plasmid pHP022 was linearised by digestion with restriction enzymes AvrII and BstXI, and the fragment (Fragment D) isolated by gel electrophoresis and purified from the gel. Fragment D was ligated with Fragments A and B described previously and the resulting ligation mixture used to transform electrocompetent E. coli DH10B. Individual clones were checked for the presence of an insert derived from DEBS1. The resulting plasmid was designated pHP048. Sequence analysis was used to confirm the clone contained the correct motif HASH (SEQ ID NO:121).

Example 5

Construction of S. erythraea NRRL2338 JC2 (pHP048) and Its Use to Produce Triketides

S. erythraea NRRL2338 JC2 contains a deletion of the eryAI, eryAII and eryAIII apart from the TE (Rowe, C. J. et al. Gene 216, 215-223). Plasmid pHP048 was used to transform S. erythraea NRRL2338 JC2 protoplasts using the TE as a homology region. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 g/ml thiostrepton. S. erythraea JC2 (pHP048) was used to inoculate 5 ml TSB containing 5:g/ml thiostrepton. After three days growth 1.5 ml of this culture was used to inoculate 25 ml SM3 media containing 5 :g/ml thiostrepton in a 250 ml flask. The flask was incubated at 30° C., 250 rpm for 6 days. At this time the supernatant was adjusted to pH 3.0 with formic acid and extracted twice with an equal volume of ethyl acetate. The solvent was removed by evaporation and the residue analysed by GC/MS. A mixture of products were identified as their ammonium adducts corresponding to the AAP, PAP, APP and PPP triketide lactones as described in example 2. In this example, under the media/growth conditions described the PKS with the HASH (SEQ ID NO:121) change is more catalytically active than the HAFH (SEQ ID NO:117) change (example 2) as judged by total amounts of triketide lactone produced, however in this case the modified PKS appears to display lower selectivity towards acetate as judged by the ratio of AAP to PPP triketide lactone.

Example 6

Construction of Plasmid pHP47

Plasmid pHP47 is a pCJR24-based plasmid containing the DEBS1 PKS gene comprising a loading module, the first and second extension modules of DEBS1 and the chain terminating thioesterase. The motif YASH of the AT domain of first module has been altered to VAGH. Plasmid pHP47 was constructed by several intermediate plasmids as follows.

A DNA segment of the eryAI gene from S. erythraea extending from nucleotide 41557 to nucleotide 41120 was amplified by PCR using the following oligonucleotide primers; 5′-CGGTGCCTAGGTGCACCGACTCCCAGTCC-3′ (SEQ ID NO:129) and 5′-TTTTTCCAAGCGGCTGGCCGTGGACGTCGCGGGGCACTCCTCGCACGTCGAGACGAT-3′ (SEQ ID NO:130). The DNA from plasmid pCJR65 was used as a template. The design of the primers introduced a AvrII site at nucleotide 41125 and the second extended to a BstXI site at nucleotide 41557, also mutated the amino acid sequence YASH (SEQ ID NO:114) (encoded by nucleotides 41537-41526) to VAGH. The 442 bp PCR product was treated with T4 polynucleotide kinase and ligated to plasmid pUC18 that had been linearised by digestion with SmaI and then treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B and individual clones checked for the presence of the desired plasmid pHP46. Plasmid pHP46 was identified by restriction pattern and sequence analysis. Plasmid pHP46 was linearised by digestion with restriction enzymes AvrII and BstXI, and the fragment (Fragment E) isolated by gel electrophoresis and purified from the gel. Fragment E was ligated with Fragments A and B described previously and the resulting ligation mixture used to transform electrocompetent E. coli DH10B. Individual clones were checked for the presence of an insert derived from DEBS1. The resulting plasmid was designated pHP47. Sequence analysis was used to confirm the clone contained the correct motif VAGH (SEQ ID NO:122).

Example 7

Construction of Plasmid pLS007

Plasmid pLS007 contains the crotonyl-CoA reductase (CCR) gene from S. cinnamonensis that is believed to influence the level of ethylmalonyl-CoA within the cell. Plasmid pSG142 (Gaisser et al. Mol. Microbiol. (2000) 36 391-401) places genes under the control of the actI promoter and can be used to integrate either in the right hand side of the erythromycin gene cluster or in the act promoter region of a previously transformed actinomycete. Two oligonucleotide primers; 5′-GGCAAACATATGAAGGAAATCCTGGACGCG-3′ (SEQ ID NO:133) and 5′-TCCGCGGATCCTCAGTGCGTTCAGATCAGTGC-3′ (SEQ ID NO:134) were used to amplify the S. cinnamonensis CCR gene using genomic DNA as template. The design of the primers incorporated NdeI and BamHI restriction sites to facilitate cloning. The 1.4 kb PCR product was isolated by gel electrophoresis and purified from the gel and ligated with pSG142 that had been digested with NdeI and BglII. The resulting ligation mixture was used to transform electrocompetent E. coli DH10B cells. Plasmid pLS003 was identified by restriction analysis and sequencing to ensure errors were not introduced during amplification. A discrepancy with the published sequence was identified. However, further analysis by comparison with other published CCR protein sequences indicated pLS003 was correct. Plasmid pLS003 was digested with NdeI and XbaI and the resulting 4.5 kb fragment (fragment F) isolated by gel electrophoresis and purified from the gel. This fragment was ligated to pLSB2 a derivative of pKC1132 containing the actI/actII promoter region behind an NdeI site. Plasmid pLSB2 was digested with NdeI and XbaI and the resulting −4 kb fragment (Fragment G) purified by gel electrophoresis and purified from the gel. Fragments F and G were ligated together and the resulting ligation mixture was used to transform electrocompetent E. coli DH10B cells. Plasmid pLS007 was identified by restriction analysis.

Example 8

Construction of S. erythraea NRRL2338 JC2 (pHP47/pLS007) and Its Use to Produce Triketides

S. erythraea NRRL2338 JC2 contains a deletion of the eryAI, eryAII and eryAIII apart from the TE (Rowe, C. J. et al. Gene 216, 215-223). Plasmid pHP47 was used to transform S. erythraea NRRL2338 JC2 protoplasts using the TE as a homology region. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. pLS007 was used to transform protoplasts of S. erythraea NRRL2338 JC2 (pHP47), thiostrepton and apramycin resistant clones were selected on R2T20 agar containing 40 μg/ml thiostrepton and 50 μg/ml apramycin plus 10 mM magnesium chloride and the resistance markers verified by plating on tapwater media containing the same antibiotics. S. erythraea NRRL2338 JC2 (pHP47/pLS007) was used to inoculate 5 ml TSB containing 5 μg/ml thiostrepton and 50 μg/ml apramycin. After three days growth 1.5 ml of this culture was used to inoculate 25 ml SM3 media containing 5 μg/ml thiostrepton and 50 μg/ml apramycin in a 250 ml flask. The flask was incubated at 30° C., 250 rpm for 6 days. At this time the supernatant was adjusted to pH 3.0 with formic acid and extracted twice with an equal volume of ethyl acetate. The solvent was removed by evaporation and the residue analysed by GC/MS. In this experiment amounts of triketide product were lower but a mixture of products could be identified as their ammonium adducts corresponding to exact masses 158 172 and 186.

Example 9

Construction of S. erythraea NRRL2338 (pHP47) and Its Use to Produce Erythromycins

Plasmid pHP47 was used to transform S. erythraea NRRL2338 protoplasts. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. S. erythraea NRRL2338 (pHP47) was used to inoculate 5 ml TSB containing 5 μg/ml thiostrepton. After three days growth 1.5 ml of this culture was used to inoculate 25 ml EryP media containing 5 μg/ml thiostrepton in a 250 ml flask. The flask was incubated at 30° C., 250 rpm for 6 days. At this time the supernatant was adjusted to pH 9.0 with ammonia and extracted twice with an equal volume of ethyl acetate. The solvent was removed by evaporation and the residue analysed by HPLC/MS. Peaks of mass m/z (M+H)=734 corresponding to erythromycin A were observed.

Example 10

Construction of Plasmid pSGK051

pSGK051 is a pPFL43 based plasmid (WO 00/00500). The motif HAFH (SEQ ID NO:117) of the AT domain of the loading domain has been altered to YASH (SEQ ID NO:114). Plasmid pSGK051 was constructed by several intermediate plasmids as follows.

Plasmid pPFL43 was linearised by digestion with restriction enzymes NcoI and NotI and a 858 bp fragment (Fragment Q) isolated by gel electrophoresis and purified from the gel.

A DNA segment of the monensin loading domain from nucleotide 16360-17366 (see FIG. 5 and PCT/GB00/02072) was amplified by PCR using the following oligonucleotide primers;

(SEQ ID NO: 135) 5′-GGGGACGCGGCCGCAAGGCCCACCACCTGAAGGTCAGCTACGC CTCCCACTCCCCGCACATGGACCCCAT-3′ and (SEQ ID NO: 136) 5′-GGCTAGCGGGTCCTCGTCCGTGCCGAGGTCA-3′. The design of the primers amplified across a NotI site at nucleotide 16367 and changed the amino acid sequence HAFH (SEQ ID NO:117) to YASH (SEQ ID NO:114) at nucleotides 16398-16409, the second introduced a NheI site equivalent to that in pPFL43. The DNA from plasmid pPFL43 was used as a template. The 1006 bp PCR product was treated with T4 polynucleotide kinase and ligated to plasmid pUC18 that had been linearised by digestion with SmaI and treated with alkaline phosphatase. The ligation mixture was used to transform electrocompetent E. coli DH10B and individual clones checked for the presence of the desired plasmid pCSAT9. Plasmid pCSAT9 was identified by restriction pattern and sequence analysis. Plasmid pCSAT9 was linearised by digestion with restriction enzymes NotI and NheI and a 995 bp fragment (Fragment R) isolated by gel electrophoresis and purified from the gel. Plasmid pPFL43 was digested with NcoI and NheI to remove a 1.8 kb fragment and the larger fragment (Fragment S) isolated by gel electrophoresis and purified from the gel. Fragments Q, R and S were ligated together and the resulting ligation mixture used to transform electrocompetent E. coli DH10B. Individual clones were checked for the desired plasmid pSGK051. The resulting plasmid was analysed by restriction digest and sequenced to confirm the presence of the correct motif YASH (SEQ ID NO:114).

Example 11

Construction of S. erythraea NRRL2338 JC2/pSGK051 and Production of Triketides

Plasmid pSGK051 was used to transform S. erythraea NRRL2338 JC2 protoplasts using the TE as a homology region. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. S. erythraea NRRL2338 JC2 (pSGK051) was plated onto R2T20 agar containing 40 μg/ml thiostrepton and allowed to grow for 11 days at 30° C. Approximately 1 cm² of the agar was homogenised and extracted with a mixture of 1.2 ml ethyl acetate and 20 μl formic acid. The solvent was decanted and removed by evaporation and the residue dissolved in methanol and analysed by GC/MS. The major products were identified by comparison with authentic standards as triketide lactones (2S,3R,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-heptanoic δ-lactone and (2S,3R,4S,5R)-2,4-dimethyl-3,5-dihydroxy-n-hexanoic δ-lactone.

Example 12

Construction of S. erythraea NRRL2338 (pSGK051) and Its Use to Produce Erythromycins

Plasmid pSGK051 was used to transform S. erythraea NRRL2338 protoplasts. Thiostrepton resistant colonies were selected on R2T20 agar containing 40 μg/ml thiostrepton. S. erythraea NRRL2338 (pSGK051) was plated onto R2T20 agar containing 40 μg/ml thiostrepton and allowed to grow for 10 days at 30° C. Approximately 2 cm² of the agar was homogenised and extracted with a mixture of 1.2 ml ethyl acetate and 20 μl dilute ammonia. The solvent decanted and was removed by evaporation and the residue analysed by HPLC/MS. Peaks of mass m/z (M+H)=734 and 720 could be observed alongside likely products of incomplete processing. Comparison to authentic standards proved the compounds produced were erythromycin A and 13-methyl erythromycin A. 

1. An isolated nucleic acid encoding a polyketide synthase (PKS) enzyme complex including an altered acyltransferase (AT) domain, wherein said AT domain has been altered to substitute selectively fewer than 9 amino acid residues of the AT domain with a different amino acid residue, the substituted residue(s) consisting of one or more residues of one or more motifs which are present in the active site pocket of the AT domain and which influence the substrate specificity of the AT domain, at least one substitution affecting the substrate specificity, wherein said substituted residue(s) consists of: (a) all or part of the four-residue sequence corresponding to the YASH motif (SEQ ID NO: 114) of the AT domain of the first module of DEBS (6-deoxyerythronolide B synthase); and, optionally, one or more selected from: (b) the residue that is immediately downstream of said four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS; (c) the residue that is immediately downstream of the GHS motif; (d) the residue that is three residues downstream of the GQG motif; and (e) the residue that is four residues downstream of the conserved arginine residue as found at position 144 of SEQ ID NO:26, wherein the SEQ ID NO:26 is the AT domain of the first module of DEBS.
 2. A vector comprising the nucleic acid according to claim
 1. 3. An isolated host cell comprising the nucleic acid according to claim 1 and able to express the PKS enzyme complex.
 4. The host cell according to claim 3 which is adapted to synthesize a polyketide resulting from the action of the PKS enzyme complex.
 5. A method of synthesizing a polyketide synthase (PKS) enzyme complex, said method comprising culturing the host cell of claim 3 under suitable conditions and expressing the PKS enzyme complex.
 6. The nucleic acid of claim 1, wherein the PKS enzyme complex comprises DEBS.
 7. The nucleic acid according to claim 1, wherein said AT domain has been altered so as to alter a motif selected from XAFH (SEQ ID NO:137), XASH (SEQ ID NO: 138), and XAGH (SEQ ID NO:116) and/or to create such a motif, wherein X is any amino acid.
 8. The nucleic acid according to claim 7, wherein the motif is XAGH (SEQ ID NO: 116), and X is selected from Phe, Thr, Val and His.
 9. The nucleic acid according to claim 7, wherein the motif is XAFH (SEQ ID NO:137), and X is His.
 10. The nucleic acid according to claim 7, wherein the motif is XASH (SEQ ID NO: 138), and X is selected from Tyr, His, Trp and Val.
 11. The nucleic acid according to claim 1, wherein said four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS has been substituted with a different amino acid to produce or alter a motif containing proline.
 12. The nucleic acid according to claim 1, wherein said four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS has been substituted with a different amino acid followed by S which was produced by amino acid substitution if not already present.
 13. The nucleic acid according to claim 1, wherein said four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS has been substituted with a different amino acid to produce a motif specific for methylmalonyl-CoA, and the motif is followed by Ser, Gly, Cys or Thr which was produced by amino acid substitution if not already present.
 14. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for methylmalonyl-CoA, and the residue following the GHS motif in the active site is Gln which was produced by amino acid substitution if not already present.
 15. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for malonyl-CoA, and the residue following the GHS motif in the active site is Val, Ile or Leu which was produced by amino acid substitution if not already present.
 16. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for methylmalonyl-CoA, and the residue that is 3 residues downstream of the GQG motif is Trp which was produced by amino acid substitution if not already present.
 17. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for malonyl-CoA, and the residue that is 3 residues downstream of the GQG motif is Arg, His or Thr which was produced by amino acid substitution if not already present.
 18. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for malonyl-CoA and the residue that is 4 residues downstream of the conserved Arg as found at position 144 of SEQ ID NO:26 is Met which was produced by amino acid substitution if not already present.
 19. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for methylmalonyl-CoA and the residue that is 4 residues downstream of the conserved Arg as found at position 144 of SEQ ID NO:26 is Ile or Leu which was produced by amino acid substitution if not already present.
 20. The nucleic acid according to claim 1, wherein said AT domain has been substituted with a different amino acid to produce a motif specific for ethylmalonyl-CoA and the residue that is 4 residues downstream of the conserved Arg as found at position 144 of SEQ ID NO:26 is Trp which was produced by amino acid substitution if not already present.
 21. The nucleic acid according to claim 1, which encodes a PKS which includes, in addition to said AT domain, a natural cognate ACP domain which, prior to the amino acid substitution, is adapted to receive a substrate transferred by the AT; and the substitution causes the AT to transfer a different substrate to said cognate ACP domain.
 22. The nucleic acid of claim 1 wherein said substituted residue(s) comprise the residue that is immediately downstream of said four-residue sequence corresponding to the YASH motif of the AT domain of the first module of DEBS.
 23. The nucleic acid of claim 1 wherein said substituted residue(s) comprise the residue that is immediately downstream of the GHS motif.
 24. The nucleic acid of claim 1 wherein said substituted residue(s) comprise the residue that is three residues downstream of the GQG motif.
 25. The nucleic acid of claim 1 wherein said substituted residue(s) comprise the residue that is four residues downstream of the conserved arginine residue as found at position 144 of SEQ ID NO:26. 